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Abstract 

This paper describes the conversion of a 
Hidden Markov Model into a sequential 
transducer that closely approximates the 
behavior of the stochastic model. This 
transformation is especially advantageous 
for part-of-speech tagging because the re- 
sulting transducer can be composed with 
other transducers that encode correction 
rules for the most frequent tagging errors. 
The speed of tagging is also improved. The 
described methods have been implemented 
and successfully tested on six languages. 



1 Introduction 

Finite-state automata have been successfully applied 
in many areas of computational linguistics. 

This paper describes two algorithms^] which ap- 
proximate a Hidden Markov Model (HMM) used for 
part-of-speech tagging by a finite-state transducer 
(FST). These algorithms may be useful beyond the 
current description on any kind of analysis of written 
or spoken language based on both finite-state tech- 
nology and HMMs, such as corpus analysis, speech 
recognition, etc. Both algorithms have been fully 
implemented. 

An HMM used for tagging encodes, like a trans- 
ducer, a relation between two languages. One lan- 
guage contains sequences of ambiguity classes ob- 
tained by looking up in a lexicon all words of a sen- 
tence. The other language contains sequences of tags 
obtained by statistically disambiguating the class se- 
quences. From the outside, an HMM tagger behaves 
like a sequential transducer that deterministically 



maps every class sequence to a tag sequence, e.g.: 



[DET,PRO] [ADJ,NOUN] [ADJ,NOUN] [END] 



DET 



ADJ 



NOUN 



END 



(1) 



The aim of the conversion is not to generate FSTs 
that behave in the same way, or in as similar a way 
as possible like HMMs, but rather FSTs that per- 
form tagging in as accurate a way as possible. The 
motivation to derive these FSTs from HMMs is that 
HMMs can be trained and converted with little man- 
ual effort. 

The tagging speed when using transducers is up 
to five times higher than when using the underly- 
ing HMMs. The main advantage of transforming an 
HMM is that the resulting transducer can be han- 
dled by finite state calculus. Among others, it can 
be composed with transducers that encode: 

• correction rules for the most frequent tagging 
errors which are automatically generated (Brill, 
1992; Roche and Schabes, 1995) or manually 
written (Chanod and Tapanainen, 1995), in or- 
der to significantly improve tagging accuracy^. 
These rules may include long-distance depen- 
dencies not handled by HMM taggers, and can 
conveniently be expressed by the replace oper- 
ator (Kaplan and Kay, 1994; Karttunen, 1995; 
Kempe and Karttunen, 1996). 

• further steps of text analysis, e.g. light parsing 
or extraction of noun phrases or other phrases 
(Ait-Mokhtar and Chanod, 1997). 

These compositions enable complex text analysis 
to be performed by a single transducer. 

An HMM transducer builds on the data (probabil- 
ity matrices) of the underlying HMM. The accuracy 



^ There is a different (unpublished) algorithm by 
Juhan M. Kupiec and John T. Maxwell (p.c). 



Automatically derived rules require less work than 
manually written ones but are unlikely to yield better re- 
sults because they would consider relatively limited con- 
text and simple relations only. 



of this data has an impact on the tagging accuracy 
of both the HMM itself and the derived transducer. 
The training of the HMM can be done on either a 
tagged or untagged corpus, and is not a topic of this 
paper since it is exhaustively described in the liter- 
ature (Bahl and Mercer, 1976; Church, 1988). 

An HMM can be identically represented by a 
weighted FST in a straightforward way. We are, 
however, interested in non-weighted transducers. 

2 n-Type Approximation 

This section presents a method that approximates 
a (1st order) HMM by a transducer, called n-type 
approximation]^ 

Like in an HMM, we take into account initial prob- 
abilities TT, transition probabilities a and class (i.e. 
observation symbol) probabilities b. We do, how- 
ever, not estimate probabilities over paths. The tag 
of the first word is selected based on its initial and 
class probability. The next tag is selected on its tran- 
sition probability given the first tag, and its class 
probability, etc. Unlike in an HMM, once a decision 
on a tag has been made, it infiuences the following 
decisions but is itself irreversible. 

A transducer encoding this behaviour can be gen- 
erated as sketched in figure |l|. In this example we 
have a set of three classes, ci with the two tags tn 
and ti2, C2 with the three tags t2i, ^22 and ^23, and 
C3 with one tag ^31. Different classes may contain 
the same tag, e.g. ti2 and ^23 may refer to the same 
tag. 

For every possible pair of a class and a tag (e.g. 
ci :ti2 or [AD J, NOUN] :NDUN) a state is created and 
labelled with this same pair (fig. |^). An initial state 
which does not correspond with any pair, is also cre- 
ated. All states are final, marked by double circles. 

For every state, as many outgoing arcs are created 
as there are classes (three in fig. |^). Each such arc 
for a particular class points to the most probable 
pair of this same class. If the arc comes from the 
initial state, the most probable pair of a class and a 
tag (destination state) is estimated by: 

argmaxpi(ci,<jfc) Tr{Uk) b{ct\Uk) (2) 

k 

If the arc comes from a state other than the initial 
state, the most probable pair is estimated by: 

arg maxp2 , ^iA;) — ^{^ik\^ previous ) b{c,\t,k) (3) 
k 

In the example (fig. |l|) ci : ti2 is the most likely pair 
of class ci , and C2 : ^23 the most likely pair of class C2 

^ Name given by the author. 



when coming from the initial state, and C2 : ^21 the 
most likely pair of class C2 when coming from the 
state of C3 :t^i. 

Every arc is labelled with the same symbol pair 
as its destination state, with the class symbol in the 
upper language and the tag symbol in the lower lan- 
guage. E.g. every arc leading to the state of ci :fi2 
is labelled with ci :ti2. 

Finally, all state labels can be deleted since the 
behaviour described above is encoded in the arc la- 
bels and the network structure. The network can be 
minimized and determinized. 

We call the model an nl-type model, the resulting 
FST an nl-type transducer and the algorithm lead- 
ing from the HMM to this transducer, an nl-type 
approximation of a 1st order HMM. 

Adapted to a 2nd order HMM, this algorithm 
would give an n2-type approximation. Adapted to 
a zero order HMM, which means only to use class 
probabilities 6, the algorithm would give an nO-type 
approximation. 

n-Type transducers have deterministic states only. 

3 s-Type Approximation 

This section presents a method that approximates 
an HMM by a transducer, called s-type approxima- 
tion^ 

Tagging a sentence based on a 1st order HMM 
includes finding the most probable tag sequence T 
given the class sequence C of the sentence. The joint 
probability of C and T can be estimated by: 

p{C,T) = p(ci....c„,ti....t„) = 

n{ti) b{ci\ti) • []a(i,|t,_i) 6(c,|t,) (4) 

The decision on a tag of a particular word cannot 
be made separately from the other tags. Tags can 
influence each other over a long distance via transi- 
tion probabilities. Often, however, it is unnecessary 
to decide on the tags of the whole sentence at once. 
In the case of a 1st order HMM, unambiguous classes 
(containing one tag only), plus the sentence begin- 
ning and end positions, constitute barriers to the 
propagation of HMM probabilities. Two tags with 
one or more barriers inbetween do not influence each 
other's probability. 



* Name given by the author. 




3.1 s-Type Sentence Model 

To tag a sentence, one can split its class sequence at 
the barriers into subsequences, then tag them sep- 
arately and concatenate them again. The result is 
equivalent to the one obtained by tagging the sen- 
tence as a whole. 

We distinguish between initial and middle sub- 
sequences. The final subsequence of a sentence is 
equivalent to a middle one, if we assume that the 
sentence end symbol (. or ! or ?) always corresponds 
to an unambiguous class c„. This allows us to ig- 
nore the meaning of the sentence end position as an 
HMM barrier because this role is taken by the un- 
ambiguous class c„ at the sentence end. 

An initial subsequence Ci starts with the sentence 
initial position, has any number (inch zero) of am- 
biguous classes Ca and ends with the first unambigu- 
ous class Cu of the sentence. It can be described by 
the regular expression^: 

Ci = Ca* Cu (5) 

The joint probability of an initial class subse- 
quence Ci of length r, together with an initial tag 
subsequence T^, can be estimated by: 

r 

p{C,,Ti) = n{t,) 6(ci|ti).[]a(i,|i,_i) 6(c,|t,) (6) 

A middle subsequence C,„ starts immediately af- 
ter an unambiguous class c„, has any number (inch 

^ Regular expression operators used in this section 
are explained in the annex. 



zero) of ambiguous classes Ca and ends with the fol- 
lowing unambiguous class c„: 

Cm — Ca* Cu (7) 

For correct probability estimation we have to in- 
clude the immediately preceding unambiguous class 
Cu, actually belonging to the preceding subsequence 
Ci or Cm- We thereby obtain an extended middle 
subsequenceB: 

Cm = < Ca* Cu (8) 

The joint probability of an extended middle class 
subsequence of length s, together with a tag sub- 
sequence T^, can be estimated by: 

s 

PiC^^T:^) = b{c,\t,) ■X{a{t,\t,^,) b{c,\t,) (9) 

3.2 Construction of an s-Type Transducer 

To build an s-type transducer, a large number of ini- 
tial class subsequences Ci and extended middle class 
subsequences are generated in one of the follow- 
ing two ways: 

(a) Extraction from a corpus 

Based on a lexicon and a guesser, we annotate an 
untagged training corpus with class labels. From ev- 
ery sentence, we extract the initial class subsequence 
Ci that ends with the first unambiguous class Cu (eq. 
^), and all extended middle subsequences rang- 
ing from any unambiguous class Cu (in the sentence) 
to the following unambiguous class (eq. ^). 



A frequency constraint (threshold) may be im- 
posed on the subsequence selection, so that the only 
subsequences retained are those that occur at least 
a certain number of times in the training corpus^. 

(b) Generation of possible subsequences 

Based on the set of classes, we generate all possi- 
ble initial and extended middle class subsequences, 
Ci and (eq. ||, ^ up to a defined length. 

Every class subsequence Ci or C^j is first dis- 
ambiguated based on a 1st order HMM, using the 
Viterbi algorithm (Viterbi, 1967; Rabiner, 1990) for 
efficiency, and then linked to its most probable tag 
subsequence Tt or by means of the cross product 
operationd: 



Si = Ci .X. Ti = ci :ii C2 1*2 



■ in 



(10) 



St^ = CI, .X. r,^„ = c^ C2 : i2 c„ : i„ (11) 

In all extended middle subsequences 5^, e.g.: 



91 



(12) 



[DET] [ADJ,NOUN] [ADJ.NOUN] [NOUN] 



DET 



ADJ 



AD J 



NOUN 



the first class symbol on the upper side and the first 
tag symbol on the lower side, will be marked as an 
extension that does not really belong to the middle 
sequence but which is necessary to disambiguate it 
correctly. Example (n2h becomes: 



m 

0.[DET] [ADJ, NOUN] [ADJ, NOUN] [NOUN] 



(13) 



O.DET 



ADJ 



ADJ 



NOUN 



We then build the union '-^5'^ of all initial subse- 
quences Si and the union ^S"^ of all extended middle 
subsequences S'^, and formulate a preliminary sen- 
tence model: 



UcO 



'5" = 



Uc. UcO 



(14) 



in which all middle subsequences 5^ are still marked 
and extended in the sense that all occurrences of all 
unambiguous classes are mentioned twice: Once un- 
marked as c„ at the end of every sequence Ci or , 
and the second time marked as c° at the beginning 
of every following sequence C°j. The upper side of 

® The frequency constraint may prevent the encoding 
of rare subsequences which would encrease the size of 
the transducer without contributing much to the tagging 
accuracy. 



the sentence model ^5"^ describes the complete (but 
extended) class sequences of possible sentences, and 
the lower side of ^5*° describes the corresponding (ex- 
tended) tag sequences. 

To ensure a correct concatenation of initial and 
middle subsequences, we formulate a concatenation 
constraint for the classes: 



(15) 



stating that every middle subsequence must begin 
with the same marked unambiguous class (e.g. 
0.[DET] ) which occurs unmarked as c„ (e.g. [DET] ) 
at the end of the preceding subsequence since both 
symbols refer to the same occurrence of this unam- 
biguous class. 

Having ensured correct concatenation, we delete 
all marked classes on the upper side of the relation 
by means of 



^c=[]<- 



(16) 



and all marked tags on the lower side by means of 



U 



-> 



(17) 



By composing the above relations with the prelim- 
inary ^entence model, we obtain the final sentence 
modelQ: 



S^Dc .0. Rc 



UcO 



'S^ .0. Dt 



(18) 



We call the model an s-type model, the corre- 
sponding EST an s-type transducer, and the whole 
algorithm leading from the HMM to the transducer, 
an s-type approximation of an HMM. 

The s-type transducer tags any corpus which con- 
tains only known subsequences, in exactly the same 
way, i.e. with the same errors, as the corresponding 
HMM tagger does. However, since an s-type trans- 
ducer is incomplete, it cannot tag sentences with 
one or more class subsequences not contained in the 
union of the initial or middle subsequences. 

3.3 Completion of an s-Type Transducer 

An incomplete s-type transducer S can be completed 
with subsequences from an auxiliary, complete n- 
type transducer N as follows: 

Eirst, we extract the union of initial and the union 
of extended middle subsequences, '^Si and ^S*^ from 



the primary s-type transducer S, and the unions }^Si 
and from the auxihary n-type transducer N. To 
extract the union '^Si of initial subsequences we use 
the following filter: 



\{cu,t)]* {cu,t) [?: 



(19) 



where (c„, t) is the 1-level format]] of the symbol pair 
Cu ■ t. The extraction takes place by 



''Si = [N.IL .0. Fs, ].l.2L 



(20) 



where the transducer N is first converted into 1- 
level formatQ, then composed with the filter Fs^ (eq. 
pj| ). We extract the lower side of this composition, 
where every sequence of N.IL remains unchanged 
from the beginning up to the first occurrence of an 
unambiguous class c„. Every following symbol is 
mapped to the empty string by means of ["^ ■ f H * 
(eq 



19) 



Finally, the extracted lower side is again 
converted into 2-level formatQ. 

The extraction of the union of extended mid- 
dle subsequences is performed in a similar way. 

We then make the joint uiiions of initial and ex- 
tended middle subsequenceaJ: 



'^Si — '^Si I [ [ nSi.U 
Uae Uce 



- u 



Joe 



U 



■ nPm 



(21) 
(22) 



In both cases (eq. ^ and 22) we union all subse- 
quences from the principal model S*, with all those 
subsequences from the auxiliary model N that are 
not in S. 

Finally, we generate the completed s+n-type 
transducer from the joint unions of subsequences ^Si 
and '-^5'^, as decribed above (eq. |l3-|l8|). 

A transducer completed in this way, disam- 
biguates all subsequences known to the principal 
incomplete s-type model, exactly as the underlying 
HMM does, and all other subsequences as the aux- 
iliary n-type model does. 

4 An Implemented Finite-State 
Tagger 

The implemented tagger requires three transducers 
which represent a lexicon, a guesser and any above 
mentioned approximation of an HMM. 

All three transducers are sequential, i.e. deter- 
ministic on the input side. 

Both the lexicon and guesser unambiguously map 
a surface form of any word that they accept to the 



^ 1-Level and 2-level format are explained in the an- 



corresponding class of tags (fig. ^, col. 1 and 2): 
First, the word is looked for in the lexicon. If this 
fails, it is looked for in the guesser. If this equally 
fails, it gets the label [UNKNOWN] which associates 
the word with the tag class of unknown words. Tag 
probabilities in this class are approximated by tags 
of words that appear only once in the training cor- 
pus. 

As soon as an input token gets labelled with the 
tag class of sentence end symbols (fig. |[ [SENT]), 
the tagger stops reading words from the input. At 
this point, the tagger has read and stored the words 
of a whole sentence (fig. ||, col. 1) and generated the 
corresponding sequence of classes (fig. ||, col. 2). 

The class sequence is now deterministically 
mapped to a tag sequence (fig. ||, col. 3) by means of 
the HMM transducer. The tagger outputs the stored 
word and tag sequence of the sentence, and contin- 
ues in the same way with the remaining sentences of 
the corpus. 



The 


[AT] 


AT 


share 


[NN.VB] 


NN 


of 


[IN] 


IN 


tripled 


[VBD , VBN] 


VBD 


within 


[IN.RB] 


IN 


that 


[CS,DT,WPS] 


DT 


span 


[NN , VB , VBD] 


VBD 


of 


[IN] 


IN 


time 


[NN , VB] 


NN 




[SENT] 


SENT 



Figure 2: Tagging a sentence 



5 Experiments and Results 

This section compares different n-type and s-type 
transducers with each other and with the underlying 
HMM. 

The FSTs perform tagging faster than the HMMs. 

Since all transducers are approximations of 
HMMs, they give a lower tagging accuracy than the 
corresponding HMMs. However, improvement in ac- 
curacy can be expected since these transducers can 
be composed with transducers encoding correction 
rules for frequent errors (sec. 0). 

Table compares different transducers on an En- 
glish test case. 

The s-|-nl-type transducer containing all possible 
subsequences up to a length of three classes is the 
most accurate (table |l|, last line, s+nl-FST (< 3): 
95.95 %) but also the largest one. A similar rate of 
accuracy at a much lower size can be achieved with 



accuracy 

in % 



tagging speed 
in words/sec 



transducer size 



# states 



4j= arcs 



creation 
time 



HMM 



96.77 



4 590 



nO-FST 



83.53 



20 582 



297 



16 sec 



nl-FST 



94.19 



17 244 



71 



21 087 



17 sec 



s+nl-FST (20K, Fl) 



94.74 



13 575 



927 



203 853 



3 min 



s+nl-FST (50K, Fl) 



94.92 



12 760 



2 675 



564 887 



10 min 



s+nl-FST (lOOK, Fl) 



95.05 



12 038 



4 709 



976 785 



23 min 



-nl-FST (lOOK, F2) 



94.76 



14 178 



476 



107 728 



2 min 



s+nl-FST (lOOK, F4) 



94.60 



14 178 



211 



52 624 



76 sec 



s+nl-FST (lOOK, F8) 



94.49 



13 870 



154 



41 598 



62 sec 



-nl-FST (IM, F2) 



95.67 



11 393 



2 049 



418 536 



7 min 



s+nl-FST (IM, F4) 



95.36 



11 193 



799 



167 952 



4 min 



s+nl-FST (IM, F8) 



95.09 



13 575 



432 



96 712 



3 min 



s+nl-FST (< 2) 



95.06 



8 180 



9 796 



1 311 962 



39 min 



s+nl-FST (< 3) 



95.95 



4 870 



92 463 



13 681 113 



47 h 



Language: 
Corpora: 
Tag set: 



Types of EST (Finite- 
nO, 111 

s+nl (lOOK, F2) 
s+nl (< 2) 



English 

19 944 words for HMM training, 19 934 words for test 
74 tags 297 classes 



State Transducers) : 
nO-type (with only lexical probabilities) or nl-type (sec, 
s-type (sec. p|), with subseq uenc es of frequency > 2, from a trai ning 



corpus of 100 000 words (sec. 3.2 a), completed with nl-type (sec. 3.3) 
s-typ e (s ec. ^), with all possible subseq uenc es of length < 2 classes 
(sec. 3.2 b), completed with nl-type (sec. 3.3) 



Computer: 



ultra2, 1 CPU, 512 MBytes physical RAM, 1.4 GBytes virtual RAM 



Table 1: Accuracy, speed, size and creation time of some HMM transducers 



the s+nl-type, either with all subsequences up to a 
length of two classes (s+nl-FST (< 2): 95.06 %) or 
with subsequences occurring at least once in a train- 
ing corpus of 100 000 words (s+nl-FST (lOOK, Fl): 
95.05 %). 

Increasing the size of the training corpus and the 
frequency limit, i.e. the number of times that a sub- 
sequence must at least occur in the training corpus 
in order to be selected (sec. 3.2 a), improves the re- 
lation between tagging accuracy and the size of the 
transducer. E.g. the s+nl-type transducer that en- 
codes subsequences from a training corpus of 20 000 
words (table |, s+nl-FST (20K, Fl): 94.74 %, 927 
states, 203 853 arcs), performs less accurate tagging 
and is bigger than the transducer that encodes sub- 
sequences occurring at least eight times in a corpus 
of 1 000 000 words (table |l|, s+nl-FST (IM, F8): 
95.09 %, 432 states, 96 712 arcs). 

Most transducers in table are faster then the 
underlying HMM; the nO-type transducer about five 
times^ There is a large variation in speed between 



the different transducers due to their structure and 
size. 

Table || compares the tagging accuracy of different 
transducers and the underlying HMM for different 
languages. In these tests the highest accuracy was 
always obtained by s-type transducers, either with 
all subsequences up to a length of two classe^ or 
with subsequences occurring at least once in a corpus 
of 100 000 words. 



6 Conclusion and Future Research 

The two methods described in this paper allow the 
approximation of an HMM used for part-of-speech 
tagging, by a finite-state transducer. Both methods 
have been fully implemented. 

The tagging speed of the transducers is up to five 
times higher than that of the underlying HMM. 

The main advantage of transforming an HMM 



Since nO-type and nl-type transducers have deter- 
ministic states only, a particular fast matching algorithm 



can be used for them. 

^ A maximal length of three classes is not considered 
here because of the high increase in size and a low in- 
crease in accuracy. 





accuracy in % 


English 


Dutch 


French 


German 


Portug. 


Spanish 


HMM 


96.77 


94.76 


98.65 


97.62 


97.12 


97.60 


nO-FST 


83.53 


81.99 


91.13 


82.97 


91.03 


93.65 


nl-FST 


94.19 


91.58 


98.18 


94.49 


96.19 


96.46 


s+nl-FST (20K, Fl) 


94.74 


92.17 


98.35 


95.23 


96.33 


96.71 


s+nl-FST (50K, Fl) 


94.92 


92.24 


98.37 


95.57 


96.49 


96.76 


s+nl-FST (lOOK, Fl) 


95.05 


92.36 


98.37 


95.81 


96.56 


96.87 


s+nl-FST (lOOK, F2) 


94.76 


92.17 


98.34 


95.51 


96.42 


96.74 


s+nl-FST (lOOK, F4) 


94.60 


92.02 


98.30 


95.29 


96.27 


96.64 


s+nl-i-Si (lOOK, hs) 


94.49 


91.84 


98.32 


95.02 


96.23 


96.54 


s+nl-FST (< 2) 


95.06 


92.25 


98.37 


95.92 


96.50 


96.90 


HMM train. crp. (#wd) 


19 944 


26 386 


22 622 


91 060 


20 956 


16 221 


test corpus words) 


19 934 


10 468 


6 368 


39 560 


15 536 


15 443 


# tags 


74 


47 


45 


66 


67 


55 


^ classes 


297 


230 


287 


389 


303 


254 



Types of FST (Finite-State Transducers) 



cf. table 1 



Table 2: Accuracy of some HMM transducers for different languages 



is that the resulting FST can be handled by finite 
state calculus[^ and thus be directly composed with 
other transducers which encode tag correction rules 
and/or perform further steps of text analysis. 

Future research will mainly focus on this pos- 
sibility and will include composition with, among 
others: 

• Transducers that encode correction rules (pos- 
sibly including long-distance dependencies) for 
the most frequent tagging errors, in order to 
significantly improve tagging accuracy. These 
rules can be either extracted automatically from 
a corpus (Brill, 1992) or written manually 
(Chanod and Tapanainen, 1995). 

• Transducers for light parsing, phrase extraction 
and other analysis (Ai't-Mokhtar and Chanod, 
1997). 

An HMM transducer can be composed with one or 
more of these transducers in order to perform com- 
plex text analysis using only a single transducer. 

We also hope to improve the n-type model by us- 
ing look-ahead to the following tagspl 
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Ongoing work has shown that, looking ahead to 
just one tag is worthless because it makes tagging results 
highly ambiguous. 



References 



Annex: Regular Expression Operators 



Ait-Mokhtar, Salah and Chanod, Jean-Pierre 
(1997). Incremental Finite-State Parsing. In 
the Proceedings of the 5th Conference of Applied 
Natural Language Processing. ACL, pp. 72-79. 
Washington, DC, USA. 

Bahl, LaUt R. and Mercer, Robert L. (1976). Part 
of Speech Assignment by a Statistical Decision 
Algorithm. In IEEE international Symposium on 
Information Theory, pp. 88-89. Ronneby. 

Brill, Eric (1992). A Simple Rule-Based Part-of- 
Speech Tagger. In the Proceedings of the 3rd con- 
ference on Applied Natural Language Processing , 
pp. 152-155. Trento, Italy 

Chanod, Jean-Pierre and Tapanainen, Pasi (1995). 
Tagging French - Comparing a Statistical and a 
Constraint Based Method. In the Proceedings of 
the 7th conference of the EACL, pp. 149-156. 
ACL. Dubhn, Ireland. 

Church, Kenneth W. (1988). A Stochastic Parts 
Program and Noun Phrase Parser for Unre- 
stricted Text. In Proceedings of the 2nd Con- 
ference on Applied Natural Language Processing. 
ACL, pp. 136-143. 

Kaplan, Ronald M. and Kay, Martin (1994). Reg- 
ular Models of Phonological Rule Systems. In 
Computational Linguistics . 20:3, pp. 331-378. 

Karttunen, Lauri (1995). The Replace Operator. 
In the Proceedings of the 33rd Annual Meeting 
of the Association for Computational Linguistics. 
Cambridge, MA, USA. |cmp-lg/9504032| 

Kempe, Andre and Karttunen, Lauri (1996). Par- 
allel Replacement in Finite State Calculus. In 
the Proceedings of the 16th International Confer- 
ence on Computational Linguistics^ pp. 622-627. 
Copenhagen, Denmark, cmp-lg/ 9607007 



Below, a and b designate symbols, A and 
B designate languages, and R and Q desig- 
nate relations between two languages. More 
details on the following operators and point- 
ers to finite-state literature can be found in 



Rabiner, Lawrence R. (1990). A Tutorial on Hid- 
den Markov Models and Selected Applications in 
Speech Recognition. In Readings in Speech Recog- 
nition (eds. A. Waibel, K.F. Lee). Morgan Kauf- 
mann Publishers, Inc. San Mateo, CA., USA. 

Roche, Emmanuel and Schabes, Yves (1995). De- 
terministic Part-of-Speech Tagging with Finite- 
State Transducers. In Computational Linguistics. 
Vol. 21, No. 2, pp. 227-253. 

Viterbi, A.J. (1967). Error Bounds for Convolu- 
tional Codes and an Asymptotical Optimal De- 
coding Algorithm. In Proceedings of IEEE, vol. 
61, pp. 268-278. 



tittp : //www. rxrc .xerox, com/research/mltt/f st 



$A Contains. Set of strings containing at least 

one occurrence of a string from A as a 
substring. 

^A Complement (negation). All strings ex- 

cept those from A. 

\a Term complement. Any symbol other 

than a. 

A* Kleene star. Zero or more times A con- 

catenated with itself. 

A+ Kleene plus. One or more times A concate- 

nated with itself. 

a -> b Replace. Relation where every a on the 
upper side gets mapped to a b on the lower 
side. 

a <- b Inverse replace. Relation where every b on 
the lower side gets mapped to an a on the 
upper side. 

a : b Symbol pair with a on the upper and b on 

the lower side, 
(a, h) 1-Level symbol which is the 1-level form 

{.IL) of the symbol pair a:b. 
R . u Upper language of R. 
R . 1 Lower language of R. 

A B Concatenation of all strings of A with all 

strings of B. 
A I B Union of A and B. 
A & B Intersection of A and B. 
A - B Relative complement (minus). All strings 

of A that are not in B. 
A .X. B Cross Product (Cartesian product) of the 

languages A and B. 
R .0. Q Composition of the relations R and Q. 
R.iL 1-Level form. Makes a language out of 

the relation R. Every symbol pair becomes 

a simple symbol, (e.g. a:b becomes (a, 6) 

and a which means a: a becomes (a, a)) 
k.2L 2-Level form. Inverse operation to .IL 

{R.1L.2L = R). 
or [ ] Empty string (epsilon). 
? Any symbol in the known alphabet and its 

extensions 



